Serveur d'exploration sur l'OCR

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

A corpus for OCR research on mathematical expressions

Identifieur interne : 001424 ( Main/Exploration ); précédent : 001423; suivant : 001425

A corpus for OCR research on mathematical expressions

Auteurs : Utpal Garain [Inde] ; Bidyut Baran Chaudhuri [Inde]

Source :

RBID : Pascal:06-0054256

Descripteurs français

English descriptors

Abstract

This paper is concerned with research on OCR (optical character recognition) of printed mathematical expressions. Construction of a representative corpus of technical and scientific documents containing expressions is discussed. A statistical investigation of the corpus is presented, and usefulness of this analysis is demonstrated in the related research problems, namely, (i) identification and segmentation of expression zones from the rest of the document, (ii) recognition of expression symbols, (iii) interpretation of expression strictures, and (iv) performance evaluation of a mathematical expression recognition system. Moreover, a groundtruthing format has been proposed to facilitate automatic evaluation of expression recognition techniques.


Affiliations:


Links toward previous steps (curation, corpus...)


Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en" level="a">A corpus for OCR research on mathematical expressions</title>
<author>
<name sortKey="Garain, Utpal" sort="Garain, Utpal" uniqKey="Garain U" first="Utpal" last="Garain">Utpal Garain</name>
<affiliation wicri:level="1">
<inist:fA14 i1="01">
<s1>Computer Vision & Pattern Recognition Unit, Indian Statistical Institute, 203 B. T. Road</s1>
<s2>Calcutta-700 035</s2>
<s3>IND</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
</inist:fA14>
<country>Inde</country>
<wicri:noRegion>Calcutta-700 035</wicri:noRegion>
</affiliation>
</author>
<author>
<name sortKey="Chaudhuri, B B" sort="Chaudhuri, B B" uniqKey="Chaudhuri B" first="B. B." last="Chaudhuri">Bidyut Baran Chaudhuri</name>
<affiliation wicri:level="1">
<inist:fA14 i1="01">
<s1>Computer Vision & Pattern Recognition Unit, Indian Statistical Institute, 203 B. T. Road</s1>
<s2>Calcutta-700 035</s2>
<s3>IND</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
</inist:fA14>
<country>Inde</country>
<wicri:noRegion>Calcutta-700 035</wicri:noRegion>
<placeName>
<settlement type="city">Calcutta</settlement>
<region type="province">Bengale-Occidental</region>
</placeName>
<orgName type="lab" n="5">Institut indien de statistiques</orgName>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">INIST</idno>
<idno type="inist">06-0054256</idno>
<date when="2005">2005</date>
<idno type="stanalyst">PASCAL 06-0054256 INIST</idno>
<idno type="RBID">Pascal:06-0054256</idno>
<idno type="wicri:Area/PascalFrancis/Corpus">000416</idno>
<idno type="wicri:Area/PascalFrancis/Curation">000371</idno>
<idno type="wicri:Area/PascalFrancis/Checkpoint">000443</idno>
<idno type="wicri:doubleKey">1433-2833:2005:Garain U:a:corpus:for</idno>
<idno type="wicri:Area/Main/Merge">001473</idno>
<idno type="wicri:Area/Main/Curation">001424</idno>
<idno type="wicri:Area/Main/Exploration">001424</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en" level="a">A corpus for OCR research on mathematical expressions</title>
<author>
<name sortKey="Garain, Utpal" sort="Garain, Utpal" uniqKey="Garain U" first="Utpal" last="Garain">Utpal Garain</name>
<affiliation wicri:level="1">
<inist:fA14 i1="01">
<s1>Computer Vision & Pattern Recognition Unit, Indian Statistical Institute, 203 B. T. Road</s1>
<s2>Calcutta-700 035</s2>
<s3>IND</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
</inist:fA14>
<country>Inde</country>
<wicri:noRegion>Calcutta-700 035</wicri:noRegion>
</affiliation>
</author>
<author>
<name sortKey="Chaudhuri, B B" sort="Chaudhuri, B B" uniqKey="Chaudhuri B" first="B. B." last="Chaudhuri">Bidyut Baran Chaudhuri</name>
<affiliation wicri:level="1">
<inist:fA14 i1="01">
<s1>Computer Vision & Pattern Recognition Unit, Indian Statistical Institute, 203 B. T. Road</s1>
<s2>Calcutta-700 035</s2>
<s3>IND</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
</inist:fA14>
<country>Inde</country>
<wicri:noRegion>Calcutta-700 035</wicri:noRegion>
<placeName>
<settlement type="city">Calcutta</settlement>
<region type="province">Bengale-Occidental</region>
</placeName>
<orgName type="lab" n="5">Institut indien de statistiques</orgName>
</affiliation>
</author>
</analytic>
<series>
<title level="j" type="main">International journal on document analysis and recognition : (Print)</title>
<title level="j" type="abbreviated">Int. j. doc. anal. recognit. : (Print)</title>
<idno type="ISSN">1433-2833</idno>
<imprint>
<date when="2005">2005</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
<seriesStmt>
<title level="j" type="main">International journal on document analysis and recognition : (Print)</title>
<title level="j" type="abbreviated">Int. j. doc. anal. recognit. : (Print)</title>
<idno type="ISSN">1433-2833</idno>
</seriesStmt>
</fileDesc>
<profileDesc>
<textClass>
<keywords scheme="KwdEn" xml:lang="en">
<term>Automatic recognition</term>
<term>Character recognition</term>
<term>Database</term>
<term>Expression evaluation</term>
<term>Mathematical formula</term>
<term>Optical character recognition</term>
<term>Performance evaluation</term>
<term>Probabilistic approach</term>
<term>Probability learning</term>
<term>Segmentation</term>
<term>Statistical analysis</term>
</keywords>
<keywords scheme="Pascal" xml:lang="fr">
<term>Reconnaissance caractère</term>
<term>Reconnaissance optique caractère</term>
<term>Reconnaissance automatique</term>
<term>Base donnée</term>
<term>Apprentissage probabilités</term>
<term>Formule mathématique</term>
<term>Evaluation performance</term>
<term>Evaluation expression</term>
<term>Analyse statistique</term>
<term>Approche probabiliste</term>
<term>Segmentation</term>
</keywords>
<keywords scheme="Wicri" type="topic" xml:lang="fr">
<term>Base de données</term>
</keywords>
</textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">This paper is concerned with research on OCR (optical character recognition) of printed mathematical expressions. Construction of a representative corpus of technical and scientific documents containing expressions is discussed. A statistical investigation of the corpus is presented, and usefulness of this analysis is demonstrated in the related research problems, namely, (i) identification and segmentation of expression zones from the rest of the document, (ii) recognition of expression symbols, (iii) interpretation of expression strictures, and (iv) performance evaluation of a mathematical expression recognition system. Moreover, a groundtruthing format has been proposed to facilitate automatic evaluation of expression recognition techniques.</div>
</front>
</TEI>
<affiliations>
<list>
<country>
<li>Inde</li>
</country>
<region>
<li>Bengale-Occidental</li>
</region>
<settlement>
<li>Calcutta</li>
</settlement>
<orgName>
<li>Institut indien de statistiques</li>
</orgName>
</list>
<tree>
<country name="Inde">
<noRegion>
<name sortKey="Garain, Utpal" sort="Garain, Utpal" uniqKey="Garain U" first="Utpal" last="Garain">Utpal Garain</name>
</noRegion>
<name sortKey="Chaudhuri, B B" sort="Chaudhuri, B B" uniqKey="Chaudhuri B" first="B. B." last="Chaudhuri">Bidyut Baran Chaudhuri</name>
</country>
</tree>
</affiliations>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/OcrV1/Data/Main/Exploration
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 001424 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd -nk 001424 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Ticri/CIDE
   |area=    OcrV1
   |flux=    Main
   |étape=   Exploration
   |type=    RBID
   |clé=     Pascal:06-0054256
   |texte=   A corpus for OCR research on mathematical expressions
}}

Wicri

This area was generated with Dilib version V0.6.32.
Data generation: Sat Nov 11 16:53:45 2017. Site generation: Mon Mar 11 23:15:16 2024